Towards the design of optimal data redundancy schemes for heterogeneous cloud storage infrastructures

نویسندگان

  • Lluis Pamies-Juarez
  • Pedro García López
  • Marc Sánchez Artigas
  • Blas Herrera
چکیده

Nowadays, data storage requirements from end-users are growing, demanding more capacity, more reliability and the capability to access information from anywhere. Cloud storage services meet this demand by providing transparent and reliable storage solutions. Most of these solutions are built on distributed infrastructures that rely on data redundancy to guarantee a 100% of data availability. Unfortunately, existing redundancy schemes very often assume that resources are homogeneous, an assumption that may increase storage costs in heterogeneous infrastructures —e.g., clouds built of voluntary resources. In this work, we analyze how distributed redundancy schemes can be optimally deployed over heterogeneous infrastructures. Specifically, we are interested in infrastructures where nodes present different online availabilities. Considering these heterogeneities, we present a mechanism to measure data availability more precisely than existing works. Using this mechanism, we infer the optimal data placement policy that reduces the redundancy used, and then its associated overheads. In heterogeneous settings, our results show that data redundancy can be reduced up to 70%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy retrieval of encrypted data by multi-purpose data-structures

The growing amount of information that has arisen from emerging technologies has caused organizations to face challenges in maintaining and managing their information. Expanding hardware, human resources, outsourcing data management, and maintenance an external organization in the form of cloud storage services, are two common approaches to overcome these challenges; The first approach costs of...

متن کامل

Cost Analysis of Redundancy Schemes for Distributed Storage Systems

Distributed storage infrastructures require the use of data redundancy to achieve high data reliability. Unfortunately, the use of redundancy introduces storage and communication overheads, which can either reduce the overall storage capacity of the system or increase its costs. To mitigate the storage and communication overhead, different redundancy schemes have been proposed. However, due to ...

متن کامل

NCCloud: applying network coding for the storage repair in a cloud-of-clouds

To provide fault tolerance for cloud storage, recent studies propose to stripe data across multiple cloud vendors. However, if a cloud suffers from a permanent failure and loses all its data, then we need to repair the lost data from other surviving clouds to preserve data redundancy. We present a proxy-based system for multiple-cloud storage called NCCloud, which aims to achieve cost-effective...

متن کامل

CodePlugin: Plugging Deduplication into Erasure Coding for Cloud Storage

Cloud storage systems play a key role in many cloud services. To tolerate multiple simultaneous disk failures and reduce the storage overhead, today cloud storage systems often employ erasure coding schemes. To simplify implementations, existing systems, such as Microsoft Azure and EMCAtmos, only support file appending operations. However, this feature leads to a nontrivial and increasing porti...

متن کامل

Storage Support for Data-Intensive Applications on Large Scale High-Performance Computing Systems

Many believe that the state-of-the-art yet decades old high-performance computing (HPC) storage would not meet the I/O requirement of the emerging exascale mainly due to the segregation of compute and storage resources. Indeed, our simulation predicts, quantitatively, that the efficiency and availability would go towards zero as the system scales approach exascale. This work proposes a new arch...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer Networks

دوره 55  شماره 

صفحات  -

تاریخ انتشار 2011